Learning to Rerank Top-K Schema Matches
نویسندگان
چکیده
We propose a learning algorithm that utilizes an innovative set of features to rerank a list of top-K schema matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches in a desired level of confidence with finding the best match. We also propose the use of matching predictors as features in a learning task, and tailored nine new matching predictors for this purpose. A large scale empirical evaluation with real-world benchmark shows the effectiveness of the proposed algorithmic solution.
منابع مشابه
Learning to Match Schemata using Predictors
We propose a learning algorithm that utilizes an innovative set of features to re-rank a list of top-K matches and improves upon the ranking of the best match. We provide a bound on the size of an initial match list, tying the number of matches in a desired level of confidence for finding the best match. We also propose the use of schema matching predictors as features in the learning task, and...
متن کاملActively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration
The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid u...
متن کاملEnsemble-based Top-k Recommender System Considering Incomplete Data
Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...
متن کاملPoint-Wise Approach for Yandex Personalized Web Search Challenge
The paper describes a solution for the Yandex Personalized Web Search Challenge. The goal of the challenge is to rerank top ten web search query results to bring most personally relevant results on the top, thereby improving the search quality. The paper focuses on feature engineering for learning to rank in web search, including a novel pair-wise feature, shortand long-term personal navigation...
متن کاملTOP: A Compiler-Based Framework for Optimizing Machine Learning Algorithms through Generalized Triangle Inequality
This paper describes our recent research progress on generalizing triangle inequality (TI) to optimize Machine Learning algorithms that involve either vector dot products (e.g., Neural Networks) or distance calculations (e.g., KNN, KMeans). The progress includes a new form of TI named Angular Triangular Inequality, abstractions to enable unified treatment to various ML algorithms, and TOP, a co...
متن کامل